Evaluation of real-time audio-visual speech recognition

نویسندگان

  • Peng Shen
  • Satoshi Tamura
  • Satoru Hayamizu
چکیده

In this paper, we propose and develop a real-time audio-visual automatic continuous speech recognition system. The system utilizes live speech signals and facial images that collected from a microphone and a camera. Optical-flow-based features are used as visual feature. VAD technology and lip tracking are utilized to improve recognition accuracy. In this paper, several experiments are conducted using Japanese connected digit speech contaminated with white noise, music, television news and car engine noise. Experimental results show when the user is listening news or in a running car with window open the recognition accuracy of the proposed system are not enough. The accuracy of the proposed system is high at a place with light music or in a running car with window close.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CENSREC-AV: evaluation frameworks for audio-visual speech recognition

This paper introduces incoming evaluation frameworks for bimodal speech recognition in noisy conditions and real environments. In order to develop a robust speech recognition in noisy environments, bimodal speech recognition which uses acoustic and visual information has been paid attention to particularly for this decade. As a lot of methods and techniques for bimodal speech recognition have b...

متن کامل

Real-time audio-visual voice activity detection for speech recognition in noisy environments

Voice activity detection (VAD) is one of the most critical issues on performance degradation of speech recognition in noisy environment applications. A real-time VAD was developed by using face parameters (eye and lip contours) as a front-end for the traditional speech and noise (audio) GMMbased method. Speech recognition performance of the audiovisual VAD is shown to be comparable with audio-o...

متن کامل

Czech audio-visual speech corpus of a car driver for in-vehicle audio-visual speech recognition

This paper presents the design of an audio-visual speech corpus for in-vehicle audio-visual speech recognition. Throughout the world, there exist several audio-visual speech corpora. There are also several (audio-only) speech corpora for in-vehicle recognition. So far, we have not found an audiovisual speech corpus for in-vehicle speech recognition. And, we have not found any audio-visual speec...

متن کامل

Viseme-dependent weight optimization for CHMM-based audio-visual speech recognition

The aim of the present study is to investigate some key challenges of the audio-visual speech recognition technology, such as asynchrony modeling of multimodal speech, estimation of auditory and visual speech significance, as well as stream weight optimization. Our research shows that the use of viseme-dependent significance weights improves the performance of state asynchronous CHMM-based spee...

متن کامل

A Survey – Audio and Video Synchronization

The audio and video Synchronization is extremely necessary. The synchronization loss between image and sound continues to disturb observers and irritate telecasters. The demand is to assure synchronization without adjusting content at the same time as still retaining price low. The objective of the synchronization is to line up both the audio and video signals that are processed individually. T...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010